无人机可以提供最小约束的适应摄像头视图,以支持机器人远程启用。此外,可以自动化无人机视图,以减轻远程运行期间操作员的负担。但是,现有方法并不关注使用无人机作为自动视图提供商的两个重要方面。首先是无人机应如何从工作空间内的一系列质量视点(例如对象的相对侧)中进行选择。第二是如何补偿不可避免的无人机姿势不确定性。在本文中,我们提供了一种非线性优化方法,该方法可通过铰接的操纵器产生有效和适应性的无人机观点,用于远程注射。我们的第一个关键想法是使用稀疏的人类输入输入来在多个自动生成的无人机观点之间切换。我们的第二个关键思想是引入优化目标,以在考虑无人机不确定性以及对观点遮挡和环境碰撞的影响的同时,保持对操纵器的视图。我们在无人机操纵器远程遥控系统中提供了无人机观点方法的实例化。最后,我们在完成普通家庭和工业操作的任务中对方法进行了初步验证。
translated by 谷歌翻译
远程编程机器人执行任务通常依赖于在机器人环境中注册感兴趣的对象。这些任务通常涉及阐明物体,例如打开或关闭阀门。但是,现有的注册对象的人类在循环方法中不考虑发音和对象几何形状的相应影响,这可能导致方法失败。在这项工作中,我们提出了一种方法,其中注册系统尝试使用非线性拟合和迭代性最接近点算法来自动确定用户选择点的对象模型,姿势和表达。当拟合不正确时,操作员可以迭代干预校正,然后系统将重新装置对象。我们介绍了具有反击关节的一种自由度(DOF)对象的拟合程序的实施,并通过用户研究对其进行评估,该用户研究表明,它可以改善用户的性能,在任务和任务负载的时间范围内,易于与手动注册方法相比,使用和有用性。我们还提出了一个示例,该示例将我们的方法集成到一个端到端系统中,以阐明远程阀。
translated by 谷歌翻译
最近,有丰富的运动规划,用于机器人操纵新的运动规划人员不断提出,每个运动规划人员都具有自己独特的优势和劣势。然而,评估新规划者是挑战性的,研究人员往往为基准创造自己的临时问题,这是耗时的,容易偏见,并且不会直接比较其他最先进的规划者。我们呈现MotionBenchmaker,一个开源工具来生成基准测试数据集以实现现实的机器人操纵问题。 MotionBenchmaker旨在成为可扩展,易于使用的工具,允许用户通过比较运动计划算法来获得数据集并通过基准测试。凭经验,我们展示了使用MotionBenchmaker作为程序生成数据集的工具的好处,这些工具有助于对规划者的公平评估有所帮助。我们还提供了一套40个预制数据集,8个环境中有5种不同的常用机器人,作为加速运动计划研究的共同点。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
The visual dimension of cities has been a fundamental subject in urban studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim, and Jacobs. Several decades later, big data and artificial intelligence (AI) are revolutionizing how people move, sense, and interact with cities. This paper reviews the literature on the appearance and function of cities to illustrate how visual information has been used to understand them. A conceptual framework, Urban Visual Intelligence, is introduced to systematically elaborate on how new image data sources and AI techniques are reshaping the way researchers perceive and measure cities, enabling the study of the physical environment and its interactions with socioeconomic environments at various scales. The paper argues that these new approaches enable researchers to revisit the classic urban theories and themes, and potentially help cities create environments that are more in line with human behaviors and aspirations in the digital age.
translated by 谷歌翻译
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
translated by 谷歌翻译
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.
translated by 谷歌翻译